For this workshop, we will be using R via RStudio.
You can think of R like a car’s engine, while RStudio is like a car’s dashboard.
ModernDive, Figure 1.1.
So what this means is that, just as we don’t drive a car by interacting directly with the engine but rather by interacting with the car’s dashboard, we won’t be using R directly.
Instead, we will be using the RStudio’s interface.
ModernDive, Figure 1.2.
After you open RStudio, you should see the following 3 panels:
ModernDive, Figure 1.3.
R packages extend the functionality of R by providing additional functions, data and documentation
ModernDive, Figure 1.4.
So let’s continue with this analogy: Let’s say you’ve purchased a new phone (brand new R/RStudio install) and you want to take a photo (do some data analysis) and share it with your friends and family. So you need to:
This process is very similar when you are using an R package. You need to:
install.packages("tidyverse")
library(tidyverse)
See ModernDive Chapter 1 for further reading.
One day you will need to quit R, go do something else and return to your analysis later.
One day you will be running multiple analyses in R and you want to keep them separate.
One day you will need to bring data from the outside world into R and present results and figures from R back out to the world.
So how do you know which parts of your analysis is “real” and where does your analysis “live”?
Working directory is where R will look, by default, for files you ask it to load or to save.
You can explicitly check your working directory with:
getwd()
## [1] "/Users/amylee/Dropbox/R_Resources/Introduction-to-R/scripts"
It is also displayed at the top of the RStudio console
Figure 1.5. Find my path
DO NOT USE setwd unless you want Jenny Bryan to set your computer on fire!
Figure 1.6. Don’t setwd()
Figure 1.7. Don’t setwd()
So what’s wrong with:
setwd("/Users/amy/fuzzy_alpaca/cute_animals/foofy/data")
df <- read.delim("raw_foofy_data.csv")
p <- ggplot(df, aes(x, y)) + geom_point()
ggsave("../figs/foofy_scatterplot.png")
The chance of the setwd() command having the desiered effect - making the file paths work - for anyone besides its author is 0%. It might not even work for the author a year or two from now. So essentially your data analysis project is not self-contained and protable, which makes recreating the plot impossible.
Read more here: https://www.tidyverse.org/articles/2017/12/workflow-vs-script/
Typically, I organize each data analysis into a project using RStudio Project. I tend to have a directory each for: